Open Problem: First-Order Regret Bounds for Contextual Bandits
نویسندگان
چکیده
We describe two open problems related to first order regret bounds for contextual bandits. The first asks for an algorithm with a regret bound of Õ( √ L?K lnN) where there areK actions,N policies, andL? is the cumulative loss of the best policy. The second asks for an optimization-oracle-efficient algorithm with regret Õ(L ? poly(K, ln(N/δ))). We describe some positive results, such as an inefficient algorithm for the second problem, and some partial negative results.
منابع مشابه
Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits
Regret bounds in online learning compare the player’s performance to L∗, the optimal performance in hindsight with a fixed strategy. Typically such bounds scale with the square root of the time horizon T . The more refined concept of first-order regret bound replaces this with a scaling √ L∗, which may be much smaller than √ T . It is well known that minor variants of standard algorithms satisf...
متن کاملThompson Sampling for Contextual Bandits with Linear Payoffs
Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better empirical performance compared to the stateof-the-art methods. However, many questions regarding its theoretical performance remained open. In this paper, we d...
متن کاملCBRAP: Contextual Bandits with RAndom Projection
Contextual bandits with linear payoffs, which are also known as linear bandits, provide a powerful alternative for solving practical problems of sequential decisions, e.g., online advertisements. In the era of big data, contextual data usually tend to be high-dimensional, which leads to new challenges for traditional linear bandits mostly designed for the setting of low-dimensional contextual d...
متن کاملStochastic Contextual Bandits with Known Reward Functions
Many sequential decision-making problems in communication networks such as power allocation in energy harvesting communications, mobile computational offloading, and dynamic channel selection can be modeled as contextual bandit problems which are natural extensions of the well-known multi-armed bandit problem. In these problems, each resource allocation or selection decision can make use of ava...
متن کاملLipschitz Bandits: Regret Lower Bound and Optimal Algorithms
We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitz function of the arm, and where the set of arms is either discrete or continuous. For discrete Lipschitz bandits, we derive asymptotic problem specific lower bounds for the regret satisfied by any algorithm, and propose OSLB and CKL-UCB, two algorithms that efficiently exploit the Lipschitz structure of t...
متن کامل